128 research outputs found
Developing a distributed electronic health-record store for India
The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India
Shuffling with a Croupier: Nat-Aware Peer-Sampling
Despite much recent research on peer-to-peer (P2P) protocols for the Internet, there have been relatively few practical protocols designed to explicitly account for Network Address Translation gateways (NATs). Those P2P protocols that do handle NATs circumvent them using relaying and hole-punching techniques to route packets to nodes residing behind NATs. In this paper, we present Croupier, a peer sampling service (PSS) that provides uniform random samples of nodes in the presence of NATs in the network. It is the first NAT-aware PSS that works without the use of relaying or hole-punching. By removing the need for relaying and hole-punching, we decrease the complexity and overhead of our protocol as well as increase
its robustness to churn and failure. We evaluated Croupier in simulation, and, in comparison with existing NAT-aware PSS’, our results show similar randomness properties, but improved robustness in the presence of both high percentages of nodes behind NATs and massive node failures. Croupier also has substantially lower protocol overhead
Gozar: NAT-friendly Peer Sampling with One-Hop Distributed NAT Traversal
Gossip-based peer sampling protocols have been widely used as a building block for many large-scale distributed applications. However, Network Address Translation gateways (NATs) cause most existing gossiping protocols to break down, as nodes cannot establish direct connections to nodes behind NATs (private nodes). In addition, most of the existing NAT traversal algorithms for establishing connectivity to private nodes rely on third party servers running at a well-known, public IP addresses. In this paper, we present Gozar, a gossip-based peer sampling service that: (i) provides uniform random samples in the presence of NATs, and (ii) enables direct connectivity to sampled nodes using a fully distributed NAT traversal service, where connection messages require only a single
hop to connect to private nodes. We show in simulation that Gozar preserves the randomness properties of a gossip-based peer sampling service. We show the robustness of Gozar when a large fraction of nodes reside behind NATs and also in
catastrophic failure scenarios. For example, if 80% of nodes are behind NATs, and 80% of the nodes fail, more than 92% of the remaining nodes stay connected. In addition, we compare Gozar with existing NAT-friendly gossip-based peer sampling services, Nylon and ARRG. We show that Gozar is the only system that supports one-hop NAT traversal, and its overhead is roughly half of Nylon’s
GLive: The Gradient overlay as a market maker for mesh-based P2P live streaming
Peer-to-Peer (P2P) live video streaming over the Internet is becoming increasingly popular, but it is still plagued
by problems of high playback latency and intermittent playback streams. This paper presents GLive, a distributed
market-based solution that builds a mesh overlay for P2P
live streaming. The mesh overlay is constructed such that (i) nodes with increasing upload bandwidth are located closer to the media source, and (ii) nodes with similar upload bandwidth become neighbours. We introduce a market-based approach that matches nodes willing and able to
share the stream with one another. However, market-based
approaches converge slowly on random overlay networks, and we improve the rate of convergence by adapting our market-based algorithm to exploit the clustering of nodes
with similar upload bandwidths in our mesh overlay. We address the problem of free-riding through nodes preferentially uploading more of the stream to the best uploaders. We compare GLive with our previous tree-based streaming protocol, Sepidar, and NewCoolstreaming in simulation, and our results show significantly improved playback continuity and playback latency
Converging an Overlay Network to a Gradient Topology
In this paper, we investigate the topology convergence problem for the
gossip-based Gradient overlay network. In an overlay network where each node
has a local utility value, a Gradient overlay network is characterized by the
properties that each node has a set of neighbors with the same utility value (a
similar view) and a set of neighbors containing higher utility values (gradient
neighbor set), such that paths of increasing utilities emerge in the network
topology. The Gradient overlay network is built using gossiping and a
preference function that samples from nodes using a uniform random peer
sampling service. We analyze it using tools from matrix analysis, and we prove
both the necessary and sufficient conditions for convergence to a complete
gradient structure, as well as estimating the convergence time and providing
bounds on worst-case convergence time. Finally, we show in simulations the
potential of the Gradient overlay, by building a more efficient live-streaming
peer-to-peer (P2P) system than one built using uniform random peer sampling.Comment: Submitted to 50th IEEE Conference on Decision and Control (CDC 2011
Cloud-native RStudio on Kubernetes for Hopsworks
In order to fully benefit from cloud computing, services are designed
following the "multi-tenant" architectural model, which is aimed at maximizing
resource sharing among users. However, multi-tenancy introduces challenges of
security, performance isolation, scaling, and customization. RStudio server is
an open-source Integrated Development Environment (IDE) accessible over a web
browser for the R programming language. We present the design and
implementation of a multi-user distributed system on Hopsworks, a
data-intensive AI platform, following the multi-tenant model that provides
RStudio as Software as a Service (SaaS). We use the most popular cloud-native
technologies: Docker and Kubernetes, to solve the problems of performance
isolation, security, and scaling that are present in a multi-tenant
environment. We further enable secure data sharing in RStudio server instances
to provide data privacy and allow collaboration among RStudio users. We
integrate our system with Apache Spark, which can scale and handle Big Data
processing workloads. Also, we provide a UI where users can provide custom
configurations and have full control of their own RStudio server instances. Our
system was tested on a Google Cloud Platform cluster with four worker nodes,
each with 30GB of RAM allocated to them. The tests on this cluster showed that
44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system
can scale out to potentially support hundreds of concurrently running RStudio
servers by adding more resources (CPUs and RAM) to the cluster or system.Comment: 8 pages, 4 figure
Digital Atlas of American Religion
poster abstractOur poster presentation will introduce DAAR, the Digital Atlas of American Religion (http://www.religionatlas.org). DAAR is a web-based research platform with innovative data exploration and visualization tools to support research in the humanities.
Time and location are essential components of humanities exploratory research; however, GIS technology, especially in its web form, does not support the easy exploration and visualization of the complex spatio-temporal data manipulated by humanists. DAAR presents researchers with an integrated solution stemming from several fields including GIS, visualization, and classification theory.
Researchers using DAAR are provided with the following exploration/visualization techniques: maps, cartograms, tree maps, pie charts, and motion charts. Using these tools and methods, researchers can explore patterns, trends, and relationships in the data that otherwise are not apparent with traditional GIS or statistical software. DAAR allows researchers to understand the multiple dimensions and diversity of religion across geographies, or within geographies. Paired with historic census data, it allows them to explore relationships to give better context and meaning to the patterns and trends. Maps provide the spatial patterns and relationships, tree maps show relative strength and relationships, charts show trends, cartograms reveal relative numbers of adherence, and motion charts animate trends over time
- …